library(lubridate) # package to help calculate age of patients
library(tableone) # package to help create descriptive table
library(magrittr) # package to add additional pipe options
library(labelled) # package to help with labeling columns
library(MatchIt) # package for the matching procedure
library(ggplot2) # package for creating fancy plots
library(ggpubr) # package to combine fancy plots
library(table1) # package to create table one
library(tidyr) # package to make life easier
library(dplyr) # package to make life easier1 Data Preperation
We examined the data prior to conducting any statistical analysis. We started by reformatting and re-coding variables, followed by a matching procedure. Lastly, we present the descriptive statistics of the matched dataset.
1.1 Data Inspection
The following R packages were utilized.
We performed feature reduction on the raw data, retaining only the variables veteran, starting_date, age, sex, pre_score, post_score, pre_diagnosis, post_diagnosis and duration. We re-coded the variables veteran, starting_date, and age. We created a missingness indicator follow_up_miss for the follow-up treatment score. There was one veteran removed without the pre-treatment diagnosis of PTSD.
# read in the raw data
load("data/data_raw.Rdata")
# select relevant columns, clean dates, list-wise omit missing data
vet <- data %>%
dplyr::select(
IsVeteraan, geslacht, StartDatum, geboortedatum, BEHDAGEN_GEPLAND,
CAPS5Score_IN, CAPS5Score_TK, CAPS5Score_FU, diagnose_IN, diagnose_TK) %>%
dplyr::mutate(
id = row_number(),
veteran = as.factor(IsVeteraan),
sex = geslacht,
starting_date = as.Date(StartDatum, "%Y-%m-%d"),
age = trunc((as.Date(geboortedatum, "%Y-%m-%d") %--%
Sys.Date()) / years(1)),
pre_score = CAPS5Score_IN,
post_score = CAPS5Score_TK,
follow_up_miss = !is.na(CAPS5Score_FU),
pre_diagnosis = factor(
diagnose_IN, levels = c(0, 1),
labels = c("No PTSD", "PTSD")),
post_diagnosis = factor(
diagnose_TK, levels = c(0, 1),
labels = c("No PTSD", "PTSD")),
duration = BEHDAGEN_GEPLAND,
.keep = "none") %>%
dplyr::filter(id != 47) %>%
dplyr::filter(pre_diagnosis == "PTSD") %>%
na.omit()The description of the resulting dataset variables is as follows.
| Variable | Explanation | Coding |
|---|---|---|
veteran |
being a veteran | factor |
starting_date |
date start treatment | date |
sex |
sex of patient | factor |
age |
age of patient | continuous |
pre_score |
pre-treatment CAPS score | continuous |
post_score |
post-treatment CAPS score | continuous |
follow_up_miss |
missingness indicator for follow-up score | factor |
pre_diagnosis |
diagnosis of PTSD pre-treatment | factor |
post_diagnosis |
diagnosis of PTSD post-treatment | factor |
duration |
treatment duration | factor |
1.2 Matching
Before making statistical inferences, we took into account the class imbalance of veteran and the potential confounding effects of age, sex, and duration.
Warning in ModuleReturnVarsExist(vars, data): The data frame does not have:
stay Dropped
Stratified by veteran
0 1 p test
n 3848 43
sex (mean (SD)) 1.63 (0.48) 1.07 (0.26) <0.001
age (mean (SD)) 41.77 (12.49) 46.93 (8.29) 0.007
We noticed a significant class imbalance with only 43 veterans and 3848 non-veterans. We should be careful with the confounding effect of age and sex, as both standardized mean differences are greater than 0.5. Therefore, we used a nearest neighbor matching procedure. The algorithm matched each veteran with the most similar non-veteran based on sex, age, starting_date, and duration.
# set seed to make stochastic results reproducible
set.seed(123)
# matching non-veterans
match <- matchit(
veteran ~ starting_date + sex + age + duration,
method = "nearest",
m.order = "largest",
replace = FALSE,
data = vet)
# matched data set
vet <- match.data(match)After the matching procedure, we observed a significant decrease in the standardized mean difference. This suggests that the matching procedure was successful in controlling the confounding effects of age, sex, and duration.
Stratified by veteran
0 1 p test
n 43 43
sex (mean (SD)) 1.07 (0.26) 1.07 (0.26) 1.000
age (mean (SD)) 46.95 (11.18) 46.93 (8.29) 0.991
duration (mean (SD)) 7.00 (1.89) 7.14 (1.73) 0.722
After the matching procedure, we added auxiliary data to describe the sample in more detail.
2 Patient Characteristics
The following table presents the patient characteristics.
| Total (N=86) |
Non-Veteran (N=43) |
Veteran (N=43) |
p-value | |
|---|---|---|---|---|
| Sex | ||||
| Male | 80 (93.0%) | 40 (93.0%) | 40 (93.0%) | 1 |
| Female | 6 (7.0%) | 3 (7.0%) | 3 (7.0%) | |
| Pre-treatment CAPS score | ||||
| Mean (SD) | 43.1 (7.69) | 42.3 (8.04) | 43.9 (7.33) | 0.343 |
| Median [Min, Max] | 43.0 [25.0, 61.0] | 41.0 [29.0, 61.0] | 45.0 [25.0, 61.0] | |
| Post-treatment CAPS score | ||||
| Mean (SD) | 15.2 (15.7) | 16.8 (17.5) | 13.5 (13.7) | 0.34 |
| Median [Min, Max] | 10.5 [0, 68.0] | 12.0 [0, 68.0] | 8.00 [0, 48.0] | |
| Difference post and pre-score | ||||
| Mean (SD) | -28.0 (15.7) | -25.6 (17.0) | -30.4 (14.0) | 0.153 |
| Median [Min, Max] | -30.0 [-61.0, 18.0] | -30.0 [-61.0, 18.0] | -31.0 [-51.0, 2.00] | |
| Age | ||||
| Mean (SD) | 46.9 (9.78) | 47.0 (11.2) | 46.9 (8.29) | 0.991 |
| Median [Min, Max] | 46.0 [24.0, 68.0] | 46.0 [24.0, 68.0] | 46.0 [33.0, 67.0] | |
| War violence | ||||
| No | 40 (46.5%) | 39 (90.7%) | 1 (2.3%) | <0.001 |
| Yes | 46 (53.5%) | 4 (9.3%) | 42 (97.7%) | |
| Sexual violence | ||||
| No | 41 (47.7%) | 13 (30.2%) | 28 (65.1%) | 0.004 |
| Yes | 45 (52.3%) | 30 (69.8%) | 15 (34.9%) | |
| Physical violence | ||||
| No | 13 (15.1%) | 5 (11.6%) | 8 (18.6%) | 0.549 |
| Yes | 73 (84.9%) | 38 (88.4%) | 35 (81.4%) | |
| Natural disasters and serious accidents | ||||
| No | 15 (17.4%) | 12 (27.9%) | 3 (7.0%) | 0.019 |
| Yes | 71 (82.6%) | 31 (72.1%) | 40 (93.0%) | |
| Mood disorders | ||||
| No | 34 (39.5%) | 16 (37.2%) | 18 (41.9%) | 0.823 |
| Yes | 52 (60.5%) | 27 (62.8%) | 25 (58.1%) | |
| Anxiety disorders | ||||
| No | 46 (53.5%) | 21 (48.8%) | 25 (58.1%) | 0.5 |
| Yes | 40 (46.5%) | 22 (51.2%) | 18 (41.9%) | |
| Suicide risk | ||||
| None | 25 (29.1%) | 10 (23.3%) | 15 (34.9%) | 0.493 |
| Low | 36 (41.9%) | 19 (44.2%) | 17 (39.5%) | |
| Medium | 16 (18.6%) | 8 (18.6%) | 8 (18.6%) | |
| High | 7 (8.1%) | 5 (11.6%) | 2 (4.7%) | |
| Missing | 2 (2.3%) | 1 (2.3%) | 1 (2.3%) | |
| Treatment duration | ||||
| 8 days or more | 67 (77.9%) | 33 (76.7%) | 34 (79.1%) | 1 |
| Less than 8 days | 19 (22.1%) | 10 (23.3%) | 9 (20.9%) |
Including p-values in Table 1 may be easy, but it is not recommended. This is true whether the data is from an observational study or a randomized trial. If p-values are included to show crude associations between predictors and outcomes, they may not be relevant as adjusted assessments from regression analysis are usually more important. Including p-values in Table 1 to show differences in participant characteristics between exposure groups is not useful. What matters for confounding is the magnitude of differences in the sample, not if the groups differ in the population (as tested by p-values).
3 Missing Data
There are 46 veterans records in total, with one being a re-registration and another without PTSD diagnosis. Excluding these records, there are 45 veterans. Of those 45, 43 have a CAPS score after treatment and 23 have a follow-up CAPS score. The next section examines whether there are any differences between veterans who have a follow-up CAPS score and those who do not.
3.1 Missing Data Mechanisms
First, we will explain some technical terms. Then, we will give a more understandable explanation.
The variable \(R\) represents whether or not a follow-up CAPS score is missing, where \(R = 0\) represents a missing score and \(R = 1\) represents a known score. The distribution of \(R\) may depend on the data, represented by \(Y\). This data may contain complete and missing information, represented by \(Y = (Y_{obs}, Y_{mis})\). The missing data may be due to either design or circumstance, such as a forgotten confounding variable. The parameter of the missing data model is represented by \(\psi\), and the general expression of the missing data model is \(P(R=0|Y_{obs}, Y_{mis},\psi)\), which expresses the probability that the follow-up score is missing, given the observed and missing data, and the parameter of the missing data model.
We can now formulate the hypothesis that:
\[H_0: P(R=0|Y_{obs}, Y_{mis},\psi)=P(R=0|\psi).\]
This implies that the observations with missing follow-up scores are a random subset of all observations, determined by \(\psi\). Therefore, there are no systematic differences between the missing and observed follow-up scores. This is known as missing completely at random (MCAR).
The alternative hypothesis is formulated as:
\[H_1: P(R=0|Y_{obs}, Y_{mis},\psi) \ne P(R=0|\psi)\]
This means that the observations with missing follow-up scores are not a random subset of all observations. The missingness may depend on another observed variable, such as veterans with higher pre-treatment scores being less likely to be lost to follow-up. This is known as a missing at random (MAR) mechanism. If the missingness depends on something that was not measured, it is called missing not at random (MNAR).
It is not possible to definitively distinguish between MCAR, MAR, and MNAR based on the data alone. However, by analyzing the standardized mean differences between the two groups, we can infer which mechanism is most likely to be the cause of missing data. If the standardized mean differences are small, it suggests that the missing data is likely MCAR. If the standardized mean differences are larger, it suggests that the missing data may be MAR or MNAR.
3.2 Exploration
There are minimal variations in age, duration, pre_score, and post_score between the veterans with and without follow up scores. This is indicated by the SMD being less than 0.5.
Stratified by follow_up_miss
FALSE TRUE p test
n 20 23
age (mean (SD)) 44.95 (7.86) 48.65 (8.45) 0.146
duration = Less than 8 days (%) 4 (20.0) 5 (21.7) 1.000
pre_score (mean (SD)) 42.55 (8.68) 45.13 (5.84) 0.254
post_score (mean (SD)) 12.05 (13.86) 14.83 (13.78) 0.515
This is also supported by the figure which shows that the distributions of the groups overlap nicely.
In conclusion, the reason for the missing follow-up score appears to be random, supporting the assumption that \(H_0\) is true.
4 Marginal Analysis
In this chapter, we investigate the effectiveness of the treatment for both veterans and non-veterans by examining the pre- and post-treatment scores. The figure presents the CAPS scores for both groups, which show a decrease in the CAPS scores. Note, the error bars in the figure below shows the 95% confidence interval of the paired t-test. The following sections will determine if this decrease is statistically significant and clinically meaningful.
4.1 Veterans
For the veteran group, the following hypotheses can be formulated:
Null Hypothesis (\(H_0\)): The treatment has no effect on the CAPS score. This is represented by the difference between the pre-treatment and post-treatment scores being greater than or equal to zero (\(CAPS_{post} - CAPS_{pre} \ge 0\)).
Alternative Hypothesis (\(H_1\)): The treatment does decrease the CAPS score. This is represented by the difference between the pre-treatment and post-treatment scores being less than zero (\(CAPS_{post} - CAPS_{pre} < 0\)).
These hypotheses can be tested using a paired t-test. The effect size can be quantified by Cohen’s distance.
Paired t-test
data: post_score and pre_score
t = -14.238, df = 42, p-value < 2.2e-16
alternative hypothesis: true mean difference is less than 0
95 percent confidence interval:
-Inf -26.80479
sample estimates:
mean difference
-30.39535
The results from the pre-test (M = 43.9, SD = 7.3) and post-test (M = 13.5, SD = 13.7) CAPS interview indicate that the treatment significantly reduced symptoms of PTSD in veterans with war-related trauma, t(42) = -14.2, p < 0.001.
Cohen's d
d estimate: 2.763555 (large)
95 percent confidence interval:
lower upper
2.16395 3.36316
The effect size for this analysis (d = 2.76) was found to exceed Cohen’s (1988) convention for a large effect (d = .80).
4.2 Non-Veterans
For the non-veteran group, the following hypotheses can be formulated:
Null Hypothesis (\(H_0\)): The treatment has no effect on the CAPS score. This is represented by the difference between the pre-treatment and post-treatment scores being greater than or equal to zero (\(CAPS_{post} - CAPS_{pre} \ge 0\)).
Alternative Hypothesis (\(H_1\)): The treatment does decrease the CAPS score. This is represented by the difference between the pre-treatment and post-treatment scores being less than zero (\(CAPS_{post} - CAPS_{pre} < 0\)).
These hypotheses can be tested using a paired t-test. The effect size can be quantified by Cohen’s distance.
Paired t-test
data: post_score and pre_score
t = -9.8795, df = 42, p-value = 8.046e-13
alternative hypothesis: true mean difference is less than 0
95 percent confidence interval:
-Inf -21.20696
sample estimates:
mean difference
-25.55814
The results from the pre-test (M = 42.3, SD = 8) and post-test (M = 16.8, SD = 17.5) CAPS interview indicate that the treatment significantly reduced symptoms of PTSD in veterans with war-related trauma, t(42) = -9.9, p < 0.001.
Cohen's d
d estimate: 1.878293 (large)
95 percent confidence interval:
lower upper
1.363465 2.393121
The effect size for this analysis (d = 1.88) was found to exceed Cohen’s (1988) convention for a large effect (d = .80).
5 Reliable Change Index
The Reliable Change Index (RCI) is a statistical tool used to determine whether a change in a test score is meaningful or simply due to random measurement error. The RCI takes into account the reliability of the measure, the variability in the obtained scores in the group, and the change in the individual’s score and yields a pseudo–z-statistic. It is important to note that the RCI is only a rough estimate of the meaningfulness of a change and does not guarantee that a change is real or significant.
We computed the RCI using a test-retest reliability of 0.73 and the pooled standard deviations of the test scores. A patient was considered to have improved if the CAPS score decreased with 13.25 points. From the figure it is interesting to observe that for the non-veterans improvement and no PTSD diagnosis post-treatment appear to coincide, whereas this is not the case for veterans. There are veterans that did improve on their scores, but remained with the PTSD diagnosis post-treatment.
In the veteran group compared to the non-veteran group, more participants improved and less participants remained unchanged according to the RCI. In the non-veteran group, two people deteriorated.
Deteriorated Unchanged Improved
Non-Veteran 1 11 31
Veteran 0 5 38
However, if we look at the diagnosis of PTSD post-treatment, we observe equal numbers.
No PTSD PTSD
Non-Veteran 34 9
Veteran 32 11
6 Group Differences
In this chapter we will explore the differences between the veterans and non-veterans.
6.1 Statistical Analysis
In this study we investigate the variations in CAPS scores pre- and post-treatment among veterans and non-veterans, using a Bayesian repeated measures ANOVA. This method involves creating a model comparison that incorporates various fixed effects. The null model includes a fixed effect for time and random intercepts and slopes for time per person. The first model includes all the components of the null model, along with a fixed effect for veteran status. The second model includes all the components of the first model and also incorporates an interaction term between veteran status and time.
In our analysis, we use the Bayes factor to compare two distinct models by computing the ratio of their marginal likelihoods. In particular, we utilize the \(\text{BF}_{01}\), which gauges the relative strength of evidence in favor of the null model compared to an alternative model. To determine the degree of evidence strength, we follow the widely used guidelines proposed by Lee and Wagenmakers (2013).
In our analysis, we adopted an uninformative prior for the model prior, which entails assigning equal prior probability to each of the competing models before observing any data. This means that we did not express any prior preference or bias towards any particular model. By using an uninformative prior, we allowed the data to have a greater impact on the posterior distribution over models.
In our analysis, we employed the standard multivariate Cauchy distribution in JASP (version 0.17.1) as the prior for the effects. To ensure the robustness of our results, we also calculated various Bayes factors for a range of scale parameters, from 0.0 to 1.5, for the fixed effects.
6.2 Results
| Models | P(M) | P(M|data) | BF\(_\text{M}\) | BF\(_\text{01}\) | error % |
|---|---|---|---|---|---|
| Null model | 0.333 | 0.739 | 5.664 | 1 | |
| Veteran | 0.333 | 0.18 | 0.44 | 4.097 | 2.456 |
| Veteran + Veteran × Time | 0.333 | 0.081 | 0.175 | 9.173 | 2.698 |
According to the first table, the model probability of the null model increased after observing the data, while the probabilities of the other models decreased. Specifically, the data were four times more likely to have originated from the null model than from model one, and nine times more likely to have come from the null model than from model two. These findings suggest moderate evidence in favor of the null hypothesis, indicating that there is no fixed effect of veteran status and no interaction between veteran status and time.
| Effects | P(incl) | P(excl) | P(incl|data) | P(excl|data) | BF\(_\text{excl}\) |
|---|---|---|---|---|---|
| veteran | 0.667 | 0.333 | 0.261 | 0.739 | 5.664 |
| Time ✻ veteran | 0.333 | 0.667 | 0.081 | 0.919 | 5.706 |
In the second table, we observe that when we average across the models, the evidence for excluding the fixed effects is 5.5 times greater than the evidence for including them. This suggests that the data support the null hypothesis more strongly than the alternative hypotheses that include the fixed effects.
| Variable | Level | Mean | SD | Lower CI | Upper CI |
|---|---|---|---|---|---|
| Intercept | 29.49 | 1.066 | 27.353 | 31.592 | |
| Veteran | Non-Veteran | 0.661 | 1.017 | -1.401 | 2.686 |
| Veteran | -0.661 | 1.017 | -2.730 | 1.357 | |
| Time | Pre-score | 14.074 | 0.828 | 12.410 | 15.688 |
| Post-score | -14.074 | 0.828 | -15.832 | -12.516 | |
| Veteran × Time | Non-Veteran & Pre-score | -0.902 | 0.780 | -2.488 | 0.656 |
| Non-Veteran & Post-score | 0.902 | 0.780 | -0.669 | 2.475 | |
| Veteran & Pre-score | 0.902 | 0.780 | -0.669 | 2.475 | |
| Veteran & Post-score | -0.902 | 0.780 | -2.488 | 0.656 |
The final table displays the Model Averaged Posterior Summary, which presents the 95% credible intervals for the fixed effects. Notably, the credible intervals for the fixed effects of veteran status and the interaction between time and veteran status include zero.
6.3 Robustness
In Bayesian analysis with JASP, the choice of Cauchy prior width for the fixed effects can impact the strength of evidence supporting the null hypothesis. Setting the prior width to a small value suggests a strong belief in the effect of fixed factors, resulting in less mass in the posterior distribution close to zero, thus weaker evidence for the null hypothesis. Conversely, a larger prior width implies weaker belief in the effect of fixed factors and more mass close to zero, resulting in stronger evidence supporting the null hypothesis.
The figure above displays the robustness of evidence across different Cauchy prior widths, indicating moderate to strong evidence in favor of the null model across all plausible scale parameters according to Kruschke (2011).
7 References
Kruschke, J. K. (2011). Bayesian assessment of null values via parameter estimation and model comparison. Perspectives on Psychological Science, 6(3), 299-312. https://doi.org/10.1177/1745691611406925
Lee, M. D., & Wagenmakers, E.-J. (2013). Bayesian cognitive modeling: A practical course. Cambridge University Press.